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We consider the problem of estimating the state of a large but finite number A*' of identical 
quantum systems. As A*' becomes large the problem simplifies dramatically. The only relevant 
measure of the quality of estimation becomes the mean quadratic error matrix. Here we present 
a bound on this quantity: a new quantum Cramer-Rao inequality. The new bound expresses 
succinctly how in the quantum case one can trade information about one parameter for information 
Oi ■ about another. The bound holds for arbitrary measurements on pure states, but only for separable 

_^ ' measurements on mixed states — a striking example of non-locality without entanglement for mixed 

but not for pure states. Cramer-Rao bounds are generally only derived for unbiased estimators. 
Here we give a version of our bound for biased estimators, and a simple asymptotic version for 
large N . Finally we prove that when the unknown state belongs to a two dimensional Hilbert space 
our quantum Cramer-Rao bound can always be attained and we provide an explicit measurement 
strategy that attains it. Thus we have a complete solution to the problem of estimating as efficiently 
VQ ' as possible the unknown state of a large ensemble of qubits in the same pure state. The same is 

*vj \ true for qubits in the same mixed state if one restricts oneself to separable measurements, but non- 

separable measurements allow dramatic increase of efficiency. Exactly how much increase is possible 
rvj ' is a major open problem. 
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^ ' I. INTRODUCTION 

o' 

zd? \ One of the central problems of quantum measurement theory is the estimation of an unknown quantum state. 

-".^ Originally only of theoretical interest, this problem is becoming of increasing practical importance. Indeed there are 

T^ ^ now several beautiful experimental realizations of quantum state reconstruction in such diverse systems as quantum 
»pH. optics]^, molecular states Q, trapped ionsQ and atoms in motion Q. 

The theoretical work which is the basis for these experiments is concerned with devising measurement strategies 
that are simple to realize experimentally and which allow an unambiguous reconstruction of the quantum state. The 
^ ' best known such technique is quantum state tomographyS , adapted in [|| for the case of finite dimensional Hilbert 
qh] spaces. However other techniques are also available, see for a recent discussion in the case of finite dimensional 
!• ■ Hilbert spaces. However, all these works suppose that the measurements are perfect and that any operator can be 
measured with infinite precision. But in general the quality of the reconstruction will be limited by experimental 
errorS or by finite statistics. The present work is devoted to studying this latter aspect when the unknown state 
^ ' belongs to a finite dimensional Hilbert space. 

Thus the setting of the problem is that we may dispose of a finite number N of copies of an unknown quantum 
state p (pure or mixed). Our task is to determine p as well as possible. This is by now a classical problem B Mi. 

A common approach is first to specify a cost function which numerically quantifies the deviation of the estimate 
from the true state. One then tries to devise a measurement and estimation strategy which minimizes the mean cost. 
Since the mean cost typically depends on the unknown state itself, one typically averages over all possible states to 
arrive at a single number expressing the quality of the estimation. However optimal str ateg ies have only been found 
in some simple highly symmetric cases (the covariant measurements of [|l^ , see also [^ ||l^ ) . 

However when the number of copies N becomes large one can hope that the problem becomes simpler so that one 
might be able to find the optimal strategies in this limit. The reason for this is that in the large N limit the estimation 
problem ceases to be a 'global' problem and becomes 'local'. Indeed for small N the estimated state will often be 
very different from the true state. Hence the optimal measurement strategy must take into account the behaviour of 
the cost function for large estimation errors. On the other hand in the limit of an infinite number of copies any two 
states can be distinguished with certainty. So the relevant question to ask about the estimation strategy is at what 
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rate it distinguishes neighboring states. And in that case we are only concerned with the behaviour of the estimator 
and of the cost function very close to the true value. 

To formulate the problem with precision, let us suppose that the unknown state p{9) depends on a vector of 
p unknown real parameters 9 = (61,... ,6p). For instance the 6i could correspond to various settings or physical 
properties of the apparatus that produces the state p. After carrying out a measurement on the N copies of p, one 
will guess what is 0. Call 6^ = {9^ , . . . ,6^) the guessed value. For a good estimation strategy we expect the mean 
quadratic error (m.q.e.) to decrease as 1/N: 



Ee (e - 0^)idf - 0,)) ^ V^ie) c. ^^^ (1) 



where the scaled m.q.e. matrix W{9) — {Wij{9)) ~ NV (9) does not depend on N . Eg denotes the mean taken over 
repetitions of the measurement with the value of 9 fixed. 

Consider now a smooth cost function /(0, 0), which measures how much the estimated value 9 differs from the true 
value 9 of the parameter. / will have a minimum at = 9, hence can be expanded as 

f{9,9) - U9) + Y,G,,{9m - 0i){9j - 9,) + 0{\\9 - 9f) (2) 

where C{9) = {Cij(9)) is a nonnegative matrix. Thus for a reasonable estimation strategy the mean value of the cost 
will decrease as 

Ee{f{9'',9)) = M9)+N-'Y^C.,{9)W..,i9)+o{N-') (3) 

ij 

since we expect the expectation value of higher order terms in 6' — to decrease faster then 1/A^. The problem has 
become local: only the quadratic cost matrix C'{9) and the scaled mean quadratic error matrix W{9) at 9 intervene. 
The essential question about state estimation for large ensembles is therefore what scaled m.q.e. matrices W{9) are 
attainable through arbitrary measurement and estimation procedures"! In particular, what does the boundary of this 
set of attainable m.q.e. matrices look like? 

In the case when the parameter 9 is one-dimensional (p = 1) the problem has been solved: a bound on the variance 
of unbiased estimators — the quantum Cramer- Rao bound — was given in Q , and a strategy for attaining the bound in 
the large N limit was proposed in |l5J. This justifies taking the bound to induce a 'distinguishability metric' on the 
space of states [|3||1J]. In the case of a multidimensional parameter however, though different bounds for the matrix 
W have been estabhshed, in general they are not tight P|[p^pO|. 

In this paper we present a new bound for W in the multiparameter case which is inspired by the discussion in p5[ . 
This bound expresses in a natural way how one can trade information about one parameter for information about 
another. The interest of this new bound depends on the precise problem one is considering: 
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When p{9) — \^{9)){iIj{9)\ is a pure state belonging to a 2 dimensional Hilbert space, the bound is sharp: it 
provides a necessary and sufficient condition that W must satisfy in order to be attainable. Furthermore, the 
bound can be attained by carrying out separate measurements on each particle. This completely solves the 
problem of estimating the state of a large ensemble of spin 1/2 particles (qubits) in the same pure state. 

When p{9) is a pure state belonging to a Hilbert space of dimension d larger then 2, then our bound on W 
applies, but it is not sharp. 

When the unknown state is mixed and belongs to a 2 dimensional Hilbert space, and if one restricts oneself to 
measurements that act separately on each particle, then our bound applies and is sharp. 

When the unknown state is mixed and belongs to a Hilbert space of dimension rf > 2, and if one restricts oneself 
to measurements that act separately on each particle, then our bound applies but is not sharp. 



• If the unknown state is mixed and one allows collective measurements, then our bound is not necessarily satisfied. 

This last point is surprising and points to a fundamental difference between measuring pure states and mixed states. 
Indeed it is known that carrying out measurements on several identical copies of the same pure state can generally 
be done better with collective measurements on the different copiesp^[pT|. This is known as 'non- locality without 



entanglement' [Q. The first point shows that in the hmit of a large number of copies, pure states of spin 1/2 do not 
exhibit non- locality without entanglement. On the other hand the last point shows that in the limit of a large number 
of copies mixed states of spin 1/2 continue to exhibit non locality without entanglement. 

To describe our bound on W, we first consider for simplicity the case of a pure state of spin 1/2 particles. Suppose 
the unknown state is a spin 1/2 known to be in a pure state, and the state is known to be almost pointing in the +z 
direction: 

me,,92))^\h) + liOi + ^02)\L) (4) 

where we have written an expression valid to first order in 9 1,62- Suppose we carry out a measurement of the 
operator ax- We obtain the outcome ±x with probability p(±a;) = (l±0i)/2. Thus the outcome of this measurement 
tells us about the value of 0i. Similarly we can carry out a measurement of ay. We obtain the outcome ±y with 
probability p(±j/) = (1 ± ^2)/2. The outcome of this measurement tells us about 62- But the measurements ax and 
ay are incompatible, i.e., the operators do not commute and cannot be measured simultaneously. Thus if one obtains 
knowledge about 61 , it is at the expense of 62 ■ Indeed suppose one has N copies of the state ip and one measures a^ 
on A^i copies and ay on A'^2 — N — Ni copies. Our estimator for 9i is the fraction of +x outcomes minus the fraction 
of —X outcomes. This estimator is unbiased. The resulting uncertainty (at the point 9i = 62 — 0) about 9i is then 

Ee((^i — ^i)'^) = i^- Similarly we can estimate ^2 and the corresponding uncertainty is Eg ((^2 — ^2)^) = ]^- We can 
combine these two expressions in the following relation: 



1 



E,((0i-ei)2) Ee{{e2-e2r) Vf, V2 



N'YTN^N (5) 
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which expresses in a compact form how we can trade knowledge about 9i for knowledge about ^2 ■ We shall show that 
it is impossible to do better than precisely (H) when one restricts attention to unbiased estimators based on arbitrary 
measurements, and asymptotically not possible to do better with any estimator whatsover. 

To generalize (|5|), we rewrite it in a more abstract form, and state it as an inequality. We use polar coordinates to 
parameterize the unknown state of the spin 1/2 particle: \tp) = cos f | t) + sin ^e*"^! |). We introduce the tensor 

-ff,,r; = 1 , H^y, = sin^ 1] , H,^^ = (6) 

which is simply the Euclidean metric on the sphere. Then the bound dq) can be reexpressed as 

tTH-\V^)-^ <N (7) 

where V^ is the m.q.e. matrix defined in (|l|). 

For mixed states belonging to a 2 dimensional Hilbert space, (R) can be generalized as follows. Let us suppose that 
the state p{9) depends on three unknown parameters. Then we can parameterize it by p{9) — -^[I + ^^ 9iai) where / 
is the identity matrix, at are the Pauli matrices and the 3 parameters 9i obey \\9\\^ = ^,. 9i < 1. We now introduce 
the tensor 

^..W='5.. + Y^j^ (8) 

which generalizes the tensor (pf) to the case of mixed states. Then, upon restricting oneself to separable measurements, 
we will show that the m.q.e. matrix V^ must satisfy (exactly for unbiased estimators, and otherwise asymptotically) 

tTH{9)-W^{9)-^ <N . (9) 

As an application of these results, the minimum of the cost function (0) in the case of spin 1/2 particles (for mixed 
states restricting oneself to separable measurement) is 



ti ^Hi9)-^/^C{9)H{9)-^/^ 
mmEe{f{9,9)) = fo{9) + ^ '- + o{l/N) (10) 



which is obtained simply by minimizing ^ subject to the constraints (Q) or (||). 

We can compare ( |lO|) with the exact results which are known in the case of covariant measurements on pure states 
of spin 1/2 particles |lC||[ll||. In this problem one is given N spin 1/2 particles polarized along the direction fi. The fi 
are uniformly distributed on the sphere. One wants to devise a measurement and estimation strategy that minimize 
the mean value of the cost function cos^ (^/2, where lu is the angle between the estimated direction Ct and the true 
direction 51. Expanding the cost function to second order in to (to obtain the quadratic cost matrix C) and averaging 
(O) over the sphere, one finds 



E(cos^.;/2)>l--+o(-) (11) 

which in the limit for large N coincides with the results (exact for all N) of ||lQ]pi|. If the directions Q are not 
uniformly distributed, then pO[ [ pJj do not apply, but (00) stays valid. However we cannot compare our results with 
the recent analysis of covariant measurements on mixed states|13] because we suppose separability of the measurement, 
whereas |12| does not. 

Equations (0) and (^ have a simple generalization to the case of particles belonging to higher dimensional Hilbert 
spaces. But in these cases these bounds are no longer sharp. 

In order to appreciate the above results, we must recall some results from classical statistical inference. This is the 
subject of the next section. 

II. CLASSICAL CRAMER-RAO BOUND 

Consider a random variable X with probability density p{x, 9). The connection with the quantum problem is that 
we can view p{x, 9) as the probability density that a quantum measurement on the system yields outcome x given 
that the state was p{9). We take a random sample of size N from the distribution and use it to estimate the value of 
each parameter 9i. Call 9f the estimated value. The following results about the m.q.e. matrix of the estimator are 
well known: 

1. Suppose that the estimator is unbiased, that is ^e{()^ — ^) = 0, where Eg is the expectation value at fixed 9, 
i.e., the integral J dxp{x\9). Define its m.q.e. matrix V^{9) by 

V,^{9)^M{0^^0^){0f-e,)). (12) 

Furthermore define the Fisher information matrix I{9) by 

h,{9) = ¥.g{deMp{X\9)deA^p{X\9)) 
^^ dg^p{x\9)ds^p{x\9) 
p{x\9) 

Then for any iV, the following inequalities, known as the Cramer- Rao inequalities, hold|19[||g|| 

V'^{9) > I{9)-^/N (14) 

or equivalcntly 

V^(9y^ < NI{9), (15) 

the inequality meaning that the difference of the two sides is a nonnegative matrix. 

2. The hypothesis of unbiased estimators is very restrictive since most estimators will be biased. Happily it is 
possible to relax this condition. Here are just two of the many results available: 



(a) First of all, if one is interested in averaging the mean cost over possible values of 9 with respect to a 
given prior distribution A(^), then there is a Bayesian version of the Cramer- Rao inequality, the van Trees 
inequality [gO| |gl| . In the multivariate case, upon giving oneself a quadratic cost function determined by a 
matrix C{0), one can derive the inequality 

..A(.).C(»,K"(»,> J'"'^'^"--^^'^'^-'"" -^ ,16, 

where a is a positive number that depends on C{9), I [9), and X{9), but is independent of N . 

(b) The second approach makes no reference to any prior distribution for 9, but only holds in the limit N 
tending to infinity and lays a mild restriction on the estimators considered. Specifically, if the probability 
distribution of yNifi^ — 9) converges uniformly in 9 towards a distribution depending continuously on 
9, say of a random vector Z, then the limiting scaled m.q.e. matrix W{9) defined by Wij{9) = Eg{ZiZj) 
obeys W>I-^. 

3. Furthermore in the limit of arbitrarily large samples one can attain the Cramer-Rao bound. This is proven by 
explicitly constructing an estimator that attains the bound in the extended senses 2a) (apart from the 1/iV^ 
term) or 2b) just indicated: the maximum likelihood estimator (m.l.e.). 

Modern statistical theory contains many other results having the same flavour as point 2 above, namely that the 
Cramer-Rao bound holds in an approximate sense for large iV, without the restriction to biased estimators. Result 
2a) applies to a larger class of estimators than 2b), but only gives a result on the average behaviour over different 
values of 9. On the other hand combining results 3) and 2b) tells us that the maximum likelihood estimator is for 
large N an optimal estimator for each value of 9 separately. The reason why in 2b) additional regularity is demanded 
is because of the phenomenon of super-efficiency (see [E2| for a recent discussion) whereby an estimator can have 
mean quadratic error of smaller order than 1/A'' at isolated points. Modern statistical theory (see again ||2^ or [p3|) 
has concentrated on the more difficult problem of obtaining non-Bayesian results (i.e., pointwise rather than average) 
making much use of the technical tool of 'local asymptotic normality'. A major challenge in the quantum case is to 
obtain a result of type 2b) when this technique is definitely not available. 

III. QUANTUM CRAMER-RAO BOUND 

In this paper we show that similar results to 1, 2a, 2b, 3 can be obtained when one must estimate the state of an 
unknown quantum system p{9) of which one possesses N copies. This problem is most simply addressed, following 
|L4| , by decomposing it into a first (quantum) step in which one carries out a measurement on p^ = p (^ . . . ® p and 
a second (classical) step in which one uses the result of the measurement to estimate the value of the parameters 9. 

The most general way to describe the measurement is by a positive operator-valued measurement (POVM) M — 
(M^) whose elements satisfy M^ > 0, ^^ M^ = I. (For simplicity we take the outcomes of the POVM to be discrete. 
The generalization to an arbitrary outcome space is just a question of translating into measure-theoretic language.) 

Quantum mechanics tells us the probability to obtain outcome ^ given state p{9): 

p(e|0)=trp^(0)M^. (17) 

From the outcome f of the measurement one must guess what are the values of the p parameters 9i. Call 9^ the 
estimated value of the parameter vector. We want to obtain bounds on the m.q.e. matrix V^ {9) of the estimator 
b'^ when the true parameter value is 9, thus V^^ {9) = '&e{9f — 9i){9^ — 9.j). To proceed we temporarily make the 

simplifying assumption that the estimators are unbiased, Eg^^ — 9. Then we can apply the classical Cramer-Rao 
inequality to the probability distribution p{£\9) to obtain: 

V^ >I'^{M,9)-^ (18) 



[V'^y <r{M,9) (19) 



where the Fisher information matrix I^ for the measurement M is defined by 

^-(^'^) = \^im — 

_ ^ tr(p^M^)tr(p^M^) 

^ ^ tr(Mjp^) ^ "^ 

with p^ = 9e^p^. 

These expressions suggest the following questions: 

1. is there a simple bound for the m.q.e. V-^ of unbiased estimators 0^, or equivalently for the Fisher information 
/^(M,6i)? 

2. is the bound also valid for sufficiently well behaved but possibly biased estimators — at least in the limit of large 

iV? 

3. can this bound be attained — at least in the limit of a large number of copies A'^? 

Most of the work on this subject has been devoted to answering the question 1). We now recall what is known 
about these questions. 

Suppose first the parameter is one-dimensional, p = \. The symmetric logarithmic derivative (s.l.d.) Ag of p is 
the Hermitian matrix defined implicitly by 

\gp + p\g 

Pfi = ^ ■ (21) 

In a basis where p is diagonal, p — '^i^Pk \k) {k\, this can be inverted to yield 

{Xe)ki = {p,e)ki ■ — . (22) 

Pk + Pi 

Then we have the bound 

I^g{M,e)<Ntrp\e\e. (23) 

Furthermore it was suggested in [|l5| how to adapt the classical m.l.e. so as to attain, in the limit of large N, the 
bound (H). 

In the multiparameter case the bound based on the s.l.d. can be generalized in a natural way. Define the s.l.d. along 
direction 6i by 

P. - ^^^, (24) 

and Helstrom's quantum information matrix H by 

ff,=trpMl±Ml. (25) 

(This is the same matrix that was introduced for spin 1/2 particles for a particular choice of parameters in (|6|) and 
(0)). Then one can prove the bound B, 



r{M,e)<NH{e). (26) 



(This can be deduced directly from (p3) as proven in JMJ. Indeed since (p3) holds for each path in parameter space, 
it implies the matrix equation (|26|)). 



However this bound is in gen eral not achievable. Another bound has been proposed based on an asymmetric 
logarithmic derivative (a.l.d.)|]16[ which in some cases is better than (26). Holevo ||l^ has proposed yet another 



bound that is stronger then both the s.l.d. and the a.l.d. bound, but this bound is not explicit: it requires a further 
minimization. As far as we know no general achievable bound is known in the multiparameter case. 

The difficulty in obtaining a simple bound in the multiparameter case is that there are many inequivalent ways in 
which one can minimize the m.q.c. matrix V^^ . That is, in order to build a good estimator one must make a choice of 
what one wants to estimate, and according to this choice the measurement strategy followed will be different. Hence 
a bound in the form of a matrix inequality like ( |26|) cannot be expected to be tight. 

IV. RESULTS 

In this paper we obtain answers to the three questions raised above in the multiparameter case. Our results are 
summarized in this section. 

We first discuss point 1), that is bounds on the Fisher information. We shall show the following: 

Theorem I: When p{9) ~ |-0(6')) ('0(0)1 is a pure state, then the Fisher information I^{M,9) defined in ( |20| ) must 
satisfy the following relation 

tr H'^ {9)1^ {M, 9) < {d~l)N (27) 

where H~^ is the inverse of the quantum information matrix defined in (|25| ) and d is the dimension of the Hilbert 
space to which p{9) belongs. Note that this inequality ( |2^ ) is invariant under change of parameterization 9 -^ &'{&)■ 

This result immediately gives an inequality for the mean quadratic error matrix of unbiased estimators 9^^ by 
invoking the classical Cramer- Rao inequality in order to replace I^ {M, 9) by the inverse of the m.q.e. V^ {9): 

ir H-\9){NV^ {9))-^ < d-l. (28) 

Theorem II: When p{9) is a mixed state, and if the measurement M consists of separate measurements on each 
particle, then the Fisher information also satisfies (|27|). Hence for separable measurements on a mixed state, the 
m.q.e. matrix of an unbiased estimator satisfies (ES^ 



Theorem III (non additivity of optimal Fisher information): In the case of mixed states, it is in general 
possible to devise a collective measurement for which the Fisher information does not satisfy the inequality (|27|). 

The second part of the paper consists in proving that the constraint (El) also holds for biased estimators under 
suitable additional conditions. We give two forms of this generalized form of (ESh corresponding to the two forms 2a) 
and 2b) of the generalized classical Cramer-Rao inequality. 

Consider N copies of a state p{9) . If p is pure we can make either collective or separable measurements. If p is mixed 
we restrict ourselves to separable measurements (since Theorem HI shows that in this case collective measurements 
can beat (P7|)). Based on the outcome of the measurement we estimate the value of the parameter vector 9. Call 9 
the estimator, and denote by V-^ = V^ {9) its m.q.e. matrix when the true value of the parameter is 9. 

We shall prove the following generalization of result of type 2b) concerning the behaviour of the mean quadratic 
error matrix as N tends to infinity: 

Theorem IV: Suppose that the scaled m.q.e. NV^ (9) has the limit W{9) as iV -^ cx3. Suppose that the convergence 
is uniform in 9 and that W is continuous at the point 9 = 9^. Furthermore we suppose that H and its derivatives are 



bounded in a neighbourhood of this point. Then we shall prove in section VI that W{9'^) must satisfy 

iYH-\9^)W-\9^)<{d-l). (29) 

This result gives a bound on the mean value of a quadratic cost function C as TV tends to infinity. Indeed using a 



Lagrange multiplier to impose the condition (29), the minimum cost is readily found to be 



lim mrC{9")V^{e°)> (trJH-i{0")C{0")H~^d°)] . (30) 

In terms of a cost function, it is also possible to prove a Bayesian version of the Cramer-Rao inequality which is 
the analogue of the classical result 2a): 

Theorem V: Suppose that one is given a quadratic cost function C{9) and a prior distribution X{9) for the parameters 
6. If C, A and H are sufficiently smooth functions of 9 (continuity of the first derivatives is sufficient), while A is zero 
outside a compact region with smooth boundary, then 

jd9\{e)trC{9)V''{9) > ljd9mtr (^^ H-H9)C{9)H-H9)^ - ^ (31) 

where a is a constant independent of N but which depends on C, A and H. 

Theorems I, II, IV and V put bounds on the m.q.e. matrix of an estimator of an unknown state p{9) (for mixed 
states, under the restriction that the measurement is separable). The third part of this article is devoted to showing 
that in the case of spin 1/2 systems {d = 2) these bounds can be attained. We first show that at any point 6*" we can 
attain equality in (p7|). 

Theorem VI: Suppose one has N spin 1/2 particles in an unknown (possibly mixed) state p{9). Fix any point 6*". Give 
yourself a matrix G° satisfying tr H^^{9^)G^ < 1. We call G" the target scaled information matrix. Then there exists 
a measurement M^ (depending on the choice of 6 >") act ing on each spin separately such that I^ {M^ ,6'") = NG'^. 



This measurement is described in detail in section VII A. 



For large N we can also approximately attain equality at all points 9 simultaneously: 

Theorem VII: Suppose one has N spin 1/2 particles in an unknown pure state IV'(^)) or suppose that one has N 
spin 1/2 particles in an unknown mixed state p{9). In the latter case we also require that the state never be pure, i.e. 
tr p{9)^ < 1 for all 6*. 

Give oneself a smooth positive matrix function G{9) satisfying tr H^^{9)G{9) < 1 for all 9, the target scaled 
information for each possible value of 9. Define the corresponding target scaled m.q.e. matrix W{9) = G{9)^^. 
Suppose that W{9) is non singular (i.e. G{9) never has a zero eigenvalue). 

Then there exists a measurement M acting on each spin separately, and a corresponding estimator 9, whose m.q.e. 
matrix V^ (9) satisfies 

V''i9) = ^+oil/N) (32) 



for all values of 9 simultaneously. For this estimation strategy \/N{9 — 9) converges in distribution towards A^(0, W), 
the normal distrib ution w ith mean zero and covariancc W . The measurement Af and estimation strategy is described 



in detail in section VII B 



It is interesting to note that the measurement strategy which satisfies (p2) is an adaptive one. That is one first 
carries out a measurement on a small fraction of the particles. This gives a preliminary estimate of the quantum 
state which allows a fine tuning of the measurements that are carried out on the remaining particles. This is to be 
contrasted to previously proposed state estimation strategies in the case of finite dimensional Hilbert spaces H lH in 
which the same measurement is carried out on all the particles. The necessity of an adaptive measurement strategy 
if one wants to minimise the m.q.e. was pointed out in |15| . 

When the unknown state belongs to a H ilber t space of dimension d > 2, then the bound (2^ cannot be attained 



in general. Indeed we shall show in section VF that for d> 2, neither (Effl nor (27) implies the other. 



V. NEW QUANTUM CRAMER-RAO INEQUALITY 



In this section we prove Theorems I, II, III. That is we prove (|2^) for general measurements in the case of pure 
states and for separate measurements on each particle in the case of mixed states. 



A. Preliminary results 

The first step in proving ( p7| ) is to show that one can restrict oneself to POVM's whose elements are proportional 
to one dimensional projectors. Indeed any POVM can always be refined to yield a POVM whose elements are 
proportional to one dimensional projectors. We call such a measurement exhaustive. This yields a refined probability 
distribution (p(^, ff}). It is well known that under such refining of the probability distribution, the Fisher information 
can only increase |24]. 



The second step in proving (27) consists in increasing the number of parameters. Suppose that p{9) depends on p 
parameters 9i, i — 1, ... ,p. li p = \tlj(9)){'tlj{9)\ is a, pure state, then p < 2d— 2 (since \fp{9)) is normalized and defined 
up to a phase). If p is a mixed state, then Hermiticity and the condition ti p — 1 impose that p < (P — 1. Suppose 
that p < ly is less then the maximum number of possible parameters (u = 2d — 2 or i/ — (P — 1 according to whether 
the state is pure or mixed) . Then one can always increase the number of parameters up to the maximum. Indeed let 
us suppose that to the p parameters, one adds independent parameters 0i', i' = p + 1, . . . jV. We can now consider 
the quantum information matrix iJ, and Fisher information matrix /, for the completed set of parameters. We shall 
show below that 

tTH-'^{9)I^{M,9) < tTH-H9)I^{M,9). (33) 

Therefore it will be sufficient to prove (|2^) in the case when there are v parameters. 

To prove (p3[), fix a particular point 9^. At this point we have the derivative p^i and s.l.d. A,; of p for i = 1, . . . ,p. 
Introduce a set of Hermitian matrices Xi' with trp(6'*')Ai' ~ 0, for i' = p+ 1, . . . ,1^, such that 

trpi9')^^l^lL±Ml^0 , z = l,...,p , ^'^p+l,. ..,,.. (34) 

This is always possible because we can view (B3) as a scalar product between Xi and A^' and a Gram-Schmidt 
orthogonalization procedure will then yield the matrices Xi'. Now define matrices p^i' by p^i' = {p{9'^)Xi' -\-Xi'p{9^))/2 
and define additional parameters 0.^/ satisfying, at 9''\ dQ.,p = p^i. The point of this construction is that because 
of (|3J), the quantum information matrix H is block diagonal with the first block equal to H. Let I{M) be the 
Fisher information matrix for the enlarged set of parameters (but the same measurement). Then tr H~^I(M) = 
tr (H^^)iiIii{M) + ti {H^^)22l22{M) where the indices 11 and 22 denote the blocks of these matrices corresponding 
to the original and the new parameters. But both terms are nonnegative since all matrices involved are nonnegative, 
and {H^^)ii — H^^, so we obtain (|33| ) at 9q and for the particular parameters just introduced. But since the right 
hand side of ( |33| ) is invariant under reparametrization, it is true for any parametrization, and at any 9. 

B. Pure states 

To proceed we shall consider a POVM whose elements are proportional to one dimensional projectors and calculate 



explicitly the left hand side of (27) in the case where the number of parameters is the maximum p = 2d— 2 in a basis 
where H is diagonal. 

We fix a point 9'^. At this point we chose a basis such that 

p(0°) = |l)(l|. (35) 

Hence the density matrix of the N copies is 

P^ = |1)(1|®...®|1)(1|. (36) 

Consider now the 2d — 2 Hermitian operators 

p,fc+ = |l)(fc| + |A;)(l| , l<k<d, 

p,fc_ = i|l)(fc|-i|fc)(l| , l<fc<d. (37) 
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We choose a parameterisation such that in the vicinity of 0°, it has the form p — p{0^) + J2k ±{^k± — S^±)p.k± with 
the unknown parameters 0k±, k — 2, . . . ,d. With this paramctrisation the derivatives of p^ are 

p'l± = P-k± <E) p-.-fE) p+ ■■■ + p<E) ■■■<E) p.k± ■ (38) 

One then calculates the s.l.d. of p and hence the quantum information matrix H. One verifies that in this basis H 
is diagonal: 

Hk±,k'±' = 44fc"5±±' . (39) 

Consider any POVM whose elements are proportional to one dimensional projectors 

d d 

M5 = |Vc)(^d , |V'?)= E--- E "^^■i-fc"!^!---^^') ■ (40) 

All — 1 fciv — 1 

The completeness relation ^, A/j = / takes the form 

E "?fci...fe«a{fci...fc^ = 4ifci ■ • ■ 4ivfcV ■ (41) 

To proceed we need the following formulae: 

trp(0O)M^ = |a^i...i|2 (42) 

and 

N 

tr p{e°)^k+M^ = ^(a|i...ia^i...fe^=fc...i + a|i...fe^=fc...iaci...i) (43) 

p=i 

and similarly for tr p{d'^)^k-M^. Thus we obtain 

N 

(trp(0O),fc+Af5)2 + (trp(0O),fc_M5)2 = ^ 4|a5i...ina5i...,^=fc...ip . (44) 

p=i 

Putting everything together yields 

trH-^IiM) ^ E typi90)Ma P^^^''P^^"^^'^^'^^' + (trp(g»),.._Af,)^ 

d N 

= EEE i"«i-*p=*-ii' 

fc=2p=l 5 

= N{d~l) (45) 



which proves that equality holds in (M) for arbitrary exhaustive measurements in the case of pure states. 
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C. One mixed state 

Deriving ( |27| ) for mixed states is more complicated than for pure, and we shall proceed in two steps. First we shall 
consider the case of one mixed state (TV = 1) and show that equality in (|2^) holds in this case for arbitrary exhaustive 
measurements. Then we shall consider the case of an arbitrary number N of mixed states. 

We first diagonahze p at a point 6*": p{9") = J2k=iPk 1^) (^1- We now introduce the following complete set of 
Hermitian traceless matrices: 

p,M+ = |fc)(/| + |0(fc| , k<l, 

PM- = i\k) {A -«I0 (fc| : k<l , 

d 
Pjn = ^Cmfc|/c)(fc| , TO = 1,. ..,rf- 1 (46) 



fe=l 



where the coefficients Cmk obey 



k 



k P^ 



Let us denote the matrices p^ki± and p^m collectively as p^i. (They constitute a set of generators of su(d)). 

We choose a parameterization such that in the vicinity of 9^, it has the form p = p(9^) + "Ylii^i ~ ^i)P,i- One 
then calculates the s.l.d. of p and from this the quantum information matrix H. One verifies that in this basis H is 
diagonal: 



Hki±,k'i'±' 


"" J J J+ + 


— , Okk'OU'0±± 
Pk+Pl 


Hkl±,m 


= 0, 


-^ni.m' 


:^ Oj-fi'^n ■ 



(48) 
Consider any POVM whose elements are proportional to one dimensional projectors 



The l.h.s. of (27) can now be written as 



tr i7-/(M) = Y. ^^;^ (e E ^ {^^\PM±m' + E i^i\p-^^m' 



(50) 



Using the following expressions 



|2, 



k 

{i;^\p,ki+m' + mp,ki-\^if - Ma^kl'la^il' (51) 



one obtains 
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^k\ ^mk ) 



= E (^^ipi^^) lEpfci^«fci'i^«'i'+EEi"«^i'i'*«'i'E^'"fc^™') 



(52) 



We now use the following relation 



E '^mkCml = SklPk - PkPl (53) 



which is derived from (f47| ) as follows: define w„ifc = Cmk/^/pk {m = 1, . . . ,d— 1) and w^fe = y/Pk- Then ( [47|) can be 
rewritten as 'Y^f.VmkVm'k = 5mm'- The vectors u,„fe therefore are a complete orthonormal basis of i?'', hence they 
obey ^j^ VmkVmk' — 5kk' ■ Reexpressing in terms of Cmk yields (p3[). Inserting it in ([5^) we obtain 



T.^^[EEM^ 



= E(i-P''OEKfcl' = Etra-p)Me 

fc « € 

= d-l (54) 

as announced. 

Note that this has demonstrated that equality holds in (p7| ) whenever N — 1^ p — dP — 1, and the POVM is 
exhaustive. It follows from the classical properties of the Fisher information that equality also holds for arbitrary 
N whenever the POVM can be considered as a sequence of N separate exhaustive measurements on each copy of 
the system. It also holds if the n'th measurement is chosen at random depending on the outcomes of the previous 
measurements. 

D. Separable measurements on N mixed states 

We shall now prove that if we possess N identical mixed states of spin 1/2 particles, and carry out separable 
measurements, then 

iiH-^I{M)<N{d-l). (55) 

We recall that a separable measurement is one that can be carried out sequentially on separate particles, where the 
measurement on one particle at any stage (and indeed which particle to measure: one is allowed to measure particles 
several times) can depend arbitrarily on the outcomes so far, see Ijl^l for a discussion. It is therefore more general 
than the case considered at the end of the previous subsection where the measurement on the nth particle could only 
depend on the measurements carried out on the n — 1 previous particles. 

If a POVM is separable, then its elements Af^ can be decomposed into a sum of terms proportional to projectors 
onto unentangled states 

Ak = El'/'?')(^?'l' 

i 
\i^ir) = \^h) ®---® l^i!) • (56) 

We call measurements having such a representation nonentangled. (Note that there exist nonentanglcd POVMs which 
are not separable |p8[). 

By refining a separable measurement (which increases the Fisher information) one can restrict oneself to measure- 
ments whose POVM elements are proportional to projectors onto product states 
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M? = mm = i^Dii^n ® ■ • ■ ® i^f )(^f I ■ (57) 

We now evaluate the l.h.s. of ( p5| ) for measurements of the form (pT[j. First recaU that the N unknown states have 
the form 

d d 

p^ = p® ...® p= ^ ••• ^ pfci---Pfc„ |fci...fcAr) (fci...fcjv| (58) 

fcl — 1 /cjv — 1 



and the derivatives of p have the form 

N 

p'J ^ p,t'8) p-.-tE) p+ ■ ■ ■ + p<E) ■■■<E) p,i = ^^p'E) ■ ..p^i.. .(E) p (59) 

p=i 

where in the second rewriting it is understood that p^i is at the p'th position in the product. 
Using the product form of measurement (|57| ) , one finds that 

i^iip^^m = E(^>iv4>-(V'|'ip,dV'|')...(^f ipiv-f) (60) 

Inserting these expressions into the Fisher information matrix one finds 

= E E(V4lHV'|)-(V'f|p.lV'f)...(V'f |p,,#f )...(^f IplV-f) 
+ 2^2^i^i\pm- {^^\p\^p) — -^^^ '^''^^ ^ 



where we have used the fact that the first term in the second equahty vanishes. Indeed it is equal to 



EEW'Ji^ivj)- '^'^'p^^r^'^' -«M<). (61) 



E ^{i^i\p<^ ■■■<»P,i<»---<»P,j<^ ■■■'^pl'^i)- (62) 

5 Pt^p' 

The sum over ^ can be carried out in (p2) to yield the identity matrix and the resulting trace vanishes because 
tr p (g) . . . ® p^i ® . . . (g) p J (g) . . . (g) jO = 0. 

We now insert ( |6l| ) into tr_ff~-'^7(M). All the operations from ( [50| ) to ( |5^ ) can be carried out exactly as in the 
previous subsection, and one arrives at the expression 

tr iJ-ij(Af) = ^^{iP^\p (g) . . . E> (I - p) (g> . . . E> pm 
P ? 
= N{d-l) (63) 

which is the sought for relation. 
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E. Inequality for more then one mixed state 

We now provide a counterexample showing that if one carries out a coUective measurement on A^ > 1 mixed states 



one can violate (27). We take N = 2, and suppose the unknown states belong to a 2 dimensional Hilbert space. 
p{9) = 5 + J2i ^i^i- We take as reference point 9i=0 corresponding to p = ^. At this point Hij{9i = 0) = 6ij. 
We consider as measurement on the two copies the following POVM 

M^( ^1 T.T.XT.T. I , l\UU){i.U\ , l\ryh){hh\ > l\iyiy){iyiy\ , 

^IT.T.XT.T. I , \\iziz){iziz\ , \\Uiz-Uz){Uiz-izU\ j. (64) 

This POVM cannot be realized by separate measurements on each particle since the last term projects onto an 
entangled state. 

For this POVM one calculates that /^(M, 6i = 0) = Sij. Hence the left hand side of ( p7| ) evaluates to ^- ■ H^^{9i = 
0)Iij{M, 9i = 0) = 3 > N(d — 1) = 2. This shows that the optimal Fisher information is non additive. 

F. Comparison with other Quantum Cramer-Rao bounds 

An important question raised by the bound (|27|) is how it compares to other quantum Cramer- Rao bounds obtained 



in the literature. In this respect, our most important result is that (27) is both a necessary and sufficient condition 
that I{M, 9) must satisfy when thedimensionality of the system d equa s 2 and the state is pure. This will be proven 



and discussed in detail in section VII 



When d > 2 (|2^) is not a sufficient condition that /(M, 9) must satisfy. To see this let us compare (^Tj) with the 
bound derived by Helstrom based on the s.l.d. This bound is the matrix inequality /^(M, 9) < NH{9), see (Pq). The 
comparison is most easily carried out by deffiiing the matrix F = jjH^^I^ H~2 = X^iLi 7i/i '^ fi where ji are the 
eigenvalues of F and fi its eigenvectors. Helstrom's bound can be reexpressed as Tj < 1 for all i, whereas the bound 
( P?! ) states that X)i7i < rf— 1- From these expressions it results that the bound ( P7|) is better then Helstrom's bound 
for d = 2. For d > 2 and p < d— 1 Helstrom's bound is better then ( p7| ) as is seen by summing the inequalities 7i < 1 
to obtain ^^ 7^ < p. For p > d— I, neither Helstrom's bound nor the bound ( p7| ) are better than the other. 

Yuen and Laxp^ have proposed another matrix bound based on an asymmetric logarithmic derivative (a.l.d.). 
This bound is known to be worse then the bound based on the s.l.d. in the case of one parameter, but it can be better, 
for some loss functions, in the case of two or more parameters. We have however not been able to make a detailed 
comparison between the bound based on the a.l.d. and (p7|). 

Although when d > 2, the bound ( p7| ) is not a sufficient condition it can be complemented by additional constraints 
based on partial traces of 7J~^/^(Af, 9) which we now exhibit. 

Consider a subset i — 1, . . . ,p' {p' < p) oi the parameters. Let p^^/ be the corresponding derivatives of p{9). Let 
us define the effective dimension d' of the space in which these parameters act at the point 0" as follows. Let H be a 
projector that commutes with p{9^) {[U, p{9^)] — 0) and such that p_i>, i' — 1, . . . ,p' acts only within the eigenspace 
of H (that is Hp,i/H = p^i'). Then d' is the smallest dimension of the eigenspace of such a projector H {d' = trH). 
To be more explicit, let us reexpress the definition of d' in coordinates. First we diagonalize p{9'^) = '^kPk\k)(k\. 
If some pk are equal this can be done in many ways. The projector H projects onto some of the eigenvectors of p: 
n = J2k=i l^)(^l- Next we write the operators p^i' in this basis: p,i' — J2k i=i(P'i')ki\k){l\ where the fact that the 
indices k,l go from one to d' expresses the fact that p^' acts only within the eigenspace of H. Finally we choose the 
smallest such d' . 

We will show that 



Y^ H-}l^^.{M,9')<N{d'~l). (65) 

i' j"' = l 

Before proving this result let us illustrate it by an example. Consider an unknown pure state in d dimensions. In 
the neighbourhood of a particular point we can parameterize the state by 

V' = |1> + (^2 + im) |2) + ... + {9d + iVd) \d) (66) 
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where the unknown parameters are 8i and rji, i — 2, . . . ,d. There are thus 2d— 2 parameters. At the point 9 = rj = 0, 
H is diagonal in this parameterization: Hg.g^ — Sij, H^j.^^. — Sij, Hq.,^. = 0. Hence ( |27| ) takes the form 

^ Ile^ (Af , = r, = 0) + /,^^, (Af , = r; = 0) < 7V(d - 1) . (67) 

i 

But using ( |65|) we also find the constraints 

I^^eSM.d = Tl = Q)+I^^^SM.O = ri = 0)<N , i = 2,...,d (68) 

which are stronger then (37) since they must hold separately, but by summing them one obtains (|67|). 

The proof of equation (6£) proceeds as in section^ First we can restrict ourselves to POVM's whose elements are 
proportional to one dimensional projectors. Secondly, we can restrict ourselves to the subspace 11 in evaluating (|6q). 
This follows from the inequality 



/(Af),,, = Y. 



< 



tr(p,,/M^)tr(p,j/M^) 
tr {pM^) 

tr {p,,,nM(Tl)tv {p,yTlM^n) 
tr (pHA/^n) + tr (p(l - n)M^{l - H)) 

tr (p,j/nA/jn)tr (pj/HAf^n) 



tr (pHA/^n) ■ ^ > 



Note that equality in (£^) holds when the measurement consists of one dimensional projectors and when the POVM 
decomposes into the sum of two POVM's acting on the subspaces spanned by 11 and 1 — 11 separately (i.e., the POVM 
elements Af^ = j^/jj) ("^jj must commute with 11 and 1 — H). Thirdly, we can increase the number of parameters from 
p' to d'^ — 1. We then introduce exactly as in (III) a parameterization in which the p^i are particularly simple, but in 
place of (lH) we use 

> Cm'k'Cmn' = Ok'l'Pk' - — TFTT- y'^l 

^-^ tr lip 

\<ra'<d' ^ ^' 



After these preliminary steps the l.h.s. of (|65|) is calculated exactly as in subsections |VB| , |VCl and |VD 



VI. DROPPING THE CONDITION OF UNBIASED ESTIMATORS 
A. Quantum van Trees inequality 

In the previous section we proved a bound on the m.q.e. of unbiased estimators d^ of A'^ copies of the quantum 
system p(ff) (with the additional condition that if p is mixed the measurement should be separable). In this section 
we shall prove Theorems IV and V that under additional conditions it is possible to drop the hypothesis that the 
estimator is unbiased. 

The starting point for the results in this section is a Bayesian form of the Cramer-Rao inequality, the van Trees 
inequality |£0| , and in particular the multivariate form of the van Trees inequality proven in |gl| . Adapted to the 
problem of estimating a quantum state, this inequality takes the following form. Let B^ be an arbitrary estimator of 
the parameter Q based on a measurement M of the system p^ {&)■ Suppose it has m.q.e. matrix V^ {G)^ and Fisher 
information matrix I^ (M^ 9). Let X{9) be a smooth density supported on a compact region (with smooth boundary) 
of the parameter space, and suppose A vanishes on the boundary. By E>, we denote expectation over a random 
parameter value O with the probability density X{9). Let C{9) and D{9) be two p x p matrix valued functions of 6, 
the former being symmetric and positive definite. Then the multivariate van Trees inequality reads 

E.trc(e)y-(e) > i^^trDie)r 

EAtrC(e)-ii^(e)/^(A/,e)i?(e)T+x(A) 
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where '^ denotes the transpose of the matrix and 

•' ^ ' ijkl 

As a first application of this inequahty we shall prove Theorem V, that is bound the minimum value averaged over 
of a quadratic cost function. Let C{0) be the quadratic cost function. Consider the matrix Wopt{0) that minimizes 
for each value of 9 the cost trC{d)W{9) under the condition that tr H{9)~^W{9)^^ < d — 1. One easily finds that 

W,^, = ^^^^'^'^'^^'' -g-VVgi/^g-igV2H-i/2 (73) 

trVCi/2iJ-iCi/2 



d-1 



C-i/Vci/2iJ-ici/2c-i/2 (74) 



and that 



(tr VH-^/^CH-^/A'^ (trVCi/2iJ-iCi/2 



tr CW.,. = ^ j-^ ^ ^ ^ j-^ ^ . (75) 

We choose in ^) D{e) = C{e)Wopt{0). Thus ti D{9) ^ tvC{9)Wopt{0) is given by ^). Note that 

D{9yC{9)-^D{d) = Wopt{0)C{9)Wopt{e) = !^£lffl^PiMij(0)-i. (76) 



Thus 



ivD{9yC{9)-^D{9)I^{M,9) - tr C{9)Wopt {9) ^^ jj^^yijN^^j^ q^ 

d— \ 

< mYC{9)Wopt{e) . (77) 

Inserting these expressions into (IT^) one obtains 



E.trC(e)F-(e) > (EAtrC(e)iy.,.(e))^ 



NExtYC{e)Wopt{e)+i{x) 



Extr C{e)Wopt{e)^ a 

- N N^ (^^^ 



where 



a = m (79) 

is independent of N. This proves that upon averaging over 9 it is impossible (for large N) to improve over the 
minimum cost (pO|). 

B. Asymptotic version of the Cramer-Rao inequality 



We now prove Theorem IV, that is an asymptotic version of our main inequality {Um which is valid at every point 
9 and does not make the assumption of unbiased estimators. We must however slightly restrict the class of competing 
estimators since otherwise by the phenomenon of super-efficiency we can beat a given estimator at any specific value 
of the parameter, though we pay for this by bad behaviour closer and closer to the chosen value as N becomes larger. 
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The restriction on the class of estimators is that N times their mean quadratic error matrix must converge uniformly 
in a neighbourhood of the true value 0° of to a hmit W{9), continuous at 9°. We assume that both W{0°) and 
H{9^) are nonsingular. Furthermore we shall require some mild smoothness conditions on H{9) in a neighbourhood 
of 9^: it must be continuous at 0" with bounded partial derivatives with respect to the parameter in a neighbourhood 
of 9^ . Note that imposing regularity conditions on H is natural since it corresponds to supposing that the 9i smoothly 
parametrize the allowed density matrices. 

Suppose that as A'^ ^ oo 

NV^{9) -> W{9) 

uniformly in 9 in a neighbourhood of 6*", with W continuous at 6'"; write W'^ — 14^(6''^). Now in (^Tj) let us make the 
following choices for the matrix functions C and D: 

C{9)^W'^'^H-\9)W''~\ 

D{9) = W°'^H-^[9). 
Then (O) (multiplied throughout by N) and (^2h become 

^ ' ^ ' - ^E,trH-i/^(M,e) + iJ(A) 

- id-i) + ±i{X) 



X{\) = j d9^Y.^^oiO)deAH^ui(^)KO)}deAHii\9)M9)}, 

ijkl 



and 

(81) 

where we have used our central inequality (|2^) to pass to (|8C|). Now suppose that the quantity (|8|) is finite (we will 
give conditions for that in a moment). By the assumed uniform convergence of NV^ to W, upon letting N ^ oo 
( po| ) becomes 

E.trl^-i^-r(e)^o-V(e) > (E.trW^o-^ff-^(9))^^ 

(d-l) 

Now suppose the density A in this equation (the probability density of 0) is replaced by an element A™ in a sequence 
of densities, concentrating on smaller and smaller neighbourhoods of 0° as tti — > oo. Assume that H{9) is continuous 
at 0". Recall our earlier assumption that W{9) is also continuous at 6'", with VF° — W{9'^). Then taking the limit as 
m -^ oo oi (b2|) yields 



tTW-^{9°)H-\9°) > (trM^-i(0")i7-i(0°))V(d- 1) 
or the required limiting form of (]27|), 

tr W-\9")H-\9")< (d-l). 

It remains to discuss whether it was reasonable to assume that T{X"^) is finite (for each m separately). Note that 
this quantity only depends on the prior density A and on H{9), where A is one of a sequence of densities supported by 
smaller and smaller neighbourhoods of 9^ . We already assumed that H{9) was continuous at 9^ . It is certainly possible 
to specify prior densities A™ concentrating on the ball of radius l/?7i, say, satisfying the smoothness assumptions in 
fell and with, for each m, finite Fisher information matrix 

d0^5«.{A"(0)}a,,{A"(0)}. 

Consideration of (pn) then shows that it suffices further just to assume that dgAH~j^{9)} is, for each «, k, bounded in 
a neighbourhood of 9^ . 

In conclusion we have shown that under mild smoothness conditions on H{9), the limiting mean quadratic error 
matrix W^ of a sufficiently regular but otherwise arbitrary sequence of estimators must satisfy the asymptotic version 
of our central inequality ti H~^W~^ < d — 1. 
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VII. ATTAINING THE CRAMER-RAO BOUND IN 2 DIMENSIONS 



We shall now show that the bounds y^T\j, (p9[), ( |3l| ) are sharp in the case of pure states of spin 1/2 systems and of 
separable measurements in the case of mixed states of spin 1/2 systems. In particular, in the limit of a large number 
of copies N any target scaled m.q.e. matrix W that satisfies tr H~^W~^ < 1 can be attained (provided W is non 
singular). We shall show this by explicitly constructing a measurement strategy that attains the bound. In section 



VI we have already shown that if triJ W > 1, then it cannot be attained 



A. Attaining the bound at a fixed point 6 

The first step in the proof is to consider the case of one copy of the unknown state (N — 1) and fix a particular 
point 6^ . Then we show that for any target information matrix G(0°) that satisfies tr iJ~^(6'°)G(6'°) < 1, we can build 
a measurement M = M^ , in general depending on 0^, such that I{M^ ,6^) = G(0°). In the next sections we shall 
show how to use this intermediate result to build a measurement and estimation strategy whose asymptotic m.q.e. is 
equal to W{0) = G{e)-^ for all 6. 

Let us first consider the case of pure states. At 6'", the state is |'0°)- We introduce a parameterization 6*1, 612 such 
that in the vicinity of IV'^), the unknown state is 

\i^{e)) = m + {e, + i02)m . (83) 

Thus the original point 6*" corresponds to the new Oi = 02 = 0. In this parameterization, H is proportional to the 
identity at 0i = 02 = 0: Hg.eM = Hg.eM = 1, HeMO) = 0. 

We now diagonalize the matrix G. Thus there exist new parameters 9[ ~ cos X6i+ sin X62, 02 = — sin Xdi+ cos A 62 
such that Geiei(O) = gi > 0, Ge'^e',{0) = 32 > 0, Ge;e^(0) = 0. 

In terms of the parameters 6'i , 62 , the unknown state is written 

\i;°)^\^P°) + {e[+t9'2)\ij'') (84) 

where \ip^ ) — e'^^\ip^). 

The POVM Af^ consists of measuring the observable |?/'°)(V'^ | + {tp^ ){4''^\ with probability gi, of measuring the 
observable i{\tp^){ip^ \ — \tp^ )('0°|) with probability 172, and of measuring nothing (or measuring the identity) with 
probability 1 — 31—52. It is straightforward to verify that the Fisher information at 0'^ in a measurement of the 
POVM M"" is equal to G{e°). 

Let us now turn to the case of mixed states. We suppose that there are three unknown parameters. We use a 
parameterization in which p{9) = (l/2)(/ + 6 ■ cr), with \\9\\ < 1. Without loss of generality we can suppose that 
0° = (0,0, n), sothat ^(6*0) = (l/2 + n/2)|l) (1| + (l/2-n/2) |2) (2| = i(/ + ncrj. The tangent space at p is spanned 

by the Pauli matrices p.x — crx{— '''^2^ ), P.y = <^y{— ^^^Y")? P,z — o'z{= P,iVi ~ n^) where in parenthesis we have 



given the relation to the basis used in section |V_Q. In this coordinate system H{9^) is diagonal with eigenvalues 1,1, 
1/(1 -n^). 

Take any symmetric positive matrix G satisfying tr Gi? ^(0°) < 1- Define the matrix i^ = iJ "^GH^^ =J2i^ifi'^ 
fi, where 7^ and fi are the eigenvalues and eigenvectors of F. The condition trGH^^{9'^) < 1 can then be rewritten 

J2i^i ^ 1- If ■^e define gi = H^f^, then we can write G = Y.i"f'i9i® 9i- Denote m, = gi/\\gi\\- 

Consider the measurement of the spin along the direction tti^. This is the POVM consisting of the two projectors 
P+nii ~ ^{I + TTii.a) and P-nii ~ \{I — mi. a). The information matrix for this measurement is 






tr(P±,„^crfe)tr(P±m,cri) {mi)k{mi)i 



I{P±.n)M ^ ^ ^^^p^^ = (i_„2(^^)2) (85) 

where {mi)k is component k of vector rrii. Therefore this information matrix is proportional to gi® gi- One verifies 
that it obeys tr H^^I{P±,yi.) = 1, as it must by our findings in section M since the measurement is exhaustive, A^ = 1, 
and p — d^ ~ 1. Therefore the coefficient of proportionality is 1 and 

I{P±rn^)^g^®g^■ (86) 
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We now combine such POVM's to obtain the POVM whose elements are 

71-f+mi, llP-mt, l2P+ni2, 72P-ni2, 73^+m3 , IsP-nis, {^ - Jl - l2 - Js) ■ (87) 

The information matrix for this measurement is just the sum 7i/(-P±mi)+72^(^±m2)+73^(^±m3) = J2i liQi^gi — G. 
Thus the POVM (JS^) attains the target information G at the point 6'°. 



B. Attaining the bound for every 9 and arbitrary A'^ by separable measurements 

We now prove Theorem VII that states that in the case of spin half particles we can attain the bound (p9) for 
every 9. Give yourself a continuous matrix W(0), the target scaled m.q.e. matrix, satisfying ( |2^ ) for every 9. Define 
G{9) — W{9)~^ ^ the target scaled information matrix, which satisfies therefore (|27|). We will show that there exists a 
separable measurement and an estimation strategy on N copies of the state p(^) such that the m.q.e. matrix V^ of 
the estimator satisfies 

V^{9),, = M{0: - d,){9, 9,)) = ^^ + o (^1 

for all 9. In fact this holds uniformly in in a sufhciently small neighbourhood of any given point. This is proven by 
constructing explicitly a measurement and estimation strategy that satisfies (|88| ) , following the lines of iQ . 

The measurement and estimation strategy we propose is the following: first take a fraction Nq = 0{N°-) of the 
states, for some fixed < a < 1, and on one third of them measure Ux^ on one third Uy and on one third Uz- One 
obtains from each measurement of a^ the outcome ±1 with probabilities ^(1 ± 9x)^ and similarly for ay, a^. Using 
this data we make a first estimate of 0, call it 0, for instance by equating the observed relative frequencies of ±1 in 
the three kinds of measurement to their theoretical values. If the state is pure this determines a first estimate of the 
direction of polarization. If the state is mixed it is possible that the initial estimate suggests that the Bloch vector 
lies outside the unit sphere. This only occurs with exponentially small probability (in Nq) and if this is the case the 
measurement is discarded. As discussed below this only affects the mean quadratic error by o(l/N). 

On the remaining N' — N ~ Nq states we carry out the measurement M = M^ such that I{M^ . 9) — G{9) which 
we have just shown how to construct. Note that I{M^,9) = G{9) is only guaranteed when 9 is precisely equal to 
9. Write I{M, 9; 9) for the Fisher information about 9, based on the measurement M^ optimal at 9, while the true 
value of the parameter is actually 9. Given 9, each of the N' second stage measurements represents one draw from 
the probability distribution p{£\9\ 9) = ti M?p{9). We use the classical m.l.e. based on this data only (with 9 fixed at 

its observed value) to estimate what is the value of 9. Call this estimated value 9. 

Let e > be fixed, arbitrarily small. Let 9^ denote the true value of 9. For given (5 > let B{9^, S) denote the ball 
of radius S about 9'^. Fix a convenient matrix norm || • ||. We have the exponential bound 

Pr{6i e B(6i", (5)} > 1 - Ce--"^"*' (89) 

for some positive numbers C and D (depending on S) . The reason we take A^o proportional to N"' for some < a < 1 
is that this ensures that 1 - Ce~^^° = o{l/N). 

Modern results |23[ on the m.l.e. 9 state that, under certain regularity conditions, the conditional m.q.e. matrix of 
9 satisfies (at 9 ~ 9^ , and conditional on 9) 

uniformly in 9'^ . We need however for the next step in our argument that this same result is true uniformly in 6 for 
given 0°. This could b e verifi ed by care ful reworking of the proof in [g^. Rather than doing that, we will explicitly 
VII C and VII D the conditional m.q.e. matrix of our estimator and show that it satisfies (pfl) 



calculate in subsection 



uniformly in in a small enough neighbourhood B{9^ , 5) of 9^ . The 'little o' in ( |90| ) refers to the chosen matrix norm. 
We will also need that I{M,9'^;9)^^ is continuous in 9 sd 9 ^ ^°, at which point it equals by our construction 



the target scaled m.q.e. W{9 ). This is also established in subsection VII C. Therefore, replacing if necessary (5 by a 



smaller value, we can guarantee that I{M, 9";9)~^ is within e of I{M, 9°; 9^)-^ = W{9^) for aU 9 £ B{9",5). 
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If 9 is outside the domain B{0^ . 5), then the norm of V^ (0°; 9) is bounded by a constant A since 9 belongs to a 
compact domain. 

Putting everything together we find that 



\\N'V^{9'')^W{9'^)\\ = 



(^N'V^' {9'^;9) -W{9'^)'^ dP{9) 



< I \\N'V^'{9";9)^W{9^)\\dP{9) + AN'C"e-^^" 
\\I{M, 9°; 9)-^ + o(l) - W{9°)\\dP{9) + o(l) 

lB(9'>,S) 

< e + o(l) + o(l). 

It follows since N'/N -> 1 as N ^ oo that hmsup \\NV^{9°) - W{9°)\\ < e. Since e was arbitrary, we obtain (||). 



C. Analysis of the conditional mean quadratic error 

We first consider the case of impure states, with the parameterization 



1 



P 



^-{I + 9.<j), with ^(0,)' < 1- 



(91) 



where we have imposed that the state is never pure. This case turns out to allow the most explicit and straightforward 
analysis because the relation between the frequency of the outcomes and the parameters 9 is linear. For other cases 
the analysis is more delicate and is discussed in the next subsection. In general, smoothness assumptions will have to 
be made on the parameterization p = p{9). 



We suppose that W{9) is non-singular and continuous in 9. Consequently the 7^ (defined in section VII A) depend 
continuously on 9 and are all strictly positive at the true value 6*" of 9. 

Given the initial estimate, the second stage measurement can be implemented as follows: for each of the N' = N—Nq 
observations, independently of one another, with probability 7^ measure the projectors P±mi, in other words, measure 
the spin observable rrii.a. With probability 1 — X]7i do nothing. 

We emphasize that the 7^ and m^ all depend on the initial estimate 9 through W{9) and H{9). In the following, 
all probability calculations are conditional on a given value of 9. 

For simplicity we will modify the procedure in the following two ways; firstly, rather than taking a random number 
of each of the three types of measurement, we will take the fixed (expected) numbers [7iA^'J (and neglect the difference 
between [7i-/V'J and 'fiN'). Secondly, we will ignore the constraint J^i^i)^ — 1- These two modifications make the 
maximum likelihood estimator easier to analyze, but do not change its asymptotic m.q.e. Later we will sketch how 
to extend the calculations to the original constrained maximum likelihood estimator based on random numbers of 
measurements of each observable. 

Now measuring rrii.a produces the values ±1 with probabilities p±i = 2(1 i 9.mi). Since our data consists of 
three binomially distributed counts and we have three parameters ^1,^2, ^3 the maximum likelihood estimator can be 
described, using the invariance of maximum likelihood estimators under 1-1 reparameterization, as follows: set the 
theoretical values p±i equal to their empirical counterparts (relative frequencies of ±1 in the 7^^'^' observations of the 
i'th spin) and solve the resulting three equations in three unknowns. 

To be explicit, define rji = 2p+i — 1 = 9.mi and let r)i be its empirical counterpart. Recall that rrii = gi/\\gi\\, 
gi = H^'^fi, where the fi are the orthonormal eigenvectors of H~^'^GH^^/^ , and where H and G are H{9), G{9), 
and 9 is the preliminary estimate of 9. Then we can rewrite 



l/2r,urrl/2ru _ ,rrl/2. 



from which we obtain 



n, = 9.m, = 9.g,l\\g4 - 9.H''' U/\\H''' f4 = {H'''9).UI\\H''' U\ 



{H''H).U = WH^'^hU 



and hence 
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= H-'f'yjH'f'MmU 



The same relation holds between 6 and r} and yields the sought for expression for 9 in terms of the empirical relative 
frequencies. 

Observing that the i]i are independent with variance 4p+m.p_m. /(7iA^') = (1 — {9.mi)'^)/{'jiN'), the m.q.e. matrix 
of 0, conditional on the preliminary estimate 9, is 



yN'/aO. 



(^°; ^~) - ^ E ;^ f 1 - ^VZf ) WH'^'m'H-'^'if^ « m-^^'. (92) 



There is no o{l/N') term here so we do not have to check uniform convergence: the limiting value is attained exactly. 
Actually we cheated by replacing [7iiV'J by ^iN' . This does introduce a o(l/N') error into (|9^ ) uniformly in a 
neighbourhood of 9'^ in which the 7^, which depend on 0, are bounded away from zero, and H and its inverse are 
bounded. ^ ^ 

One may verify that (||) reduces to W{9'^)/N' aX 9 ^ 9^ (indeed at 9^ = 9, {9.H^/'^f,Y = f^^ and \\H'^/^fi\\^ = 

). But this computation is really superfluous since at this point, we are computing the m.q.e. of the 



l-n'+n'f^^ 



1-n 



maximum likelihood estimator based on a measurement with, by our construction, Fisher information equal to the 
inverse of W{9^). (The modifications to our procedure do not alter the Fisher information). The two quantities must 
be equal by the classical large sample results for the maximum likelihood estimator. 

We finally need to show the continuity in ^ at = 0° of N' times the quantity in ( p2|) . This is evident if the 7^ are 
all different at 9^ . Both the eigenvalues and the eigenvectors of H~2GH~^ are then continuous functions of 9 at 9°. 
There is a potential difficulty however if some 7^ are equal to one another at 6* = 6*". In this case, the eigenvectors 
fi are not continuous functions of 9 at this point, and not even uniquely defined there. We argue as follows that 
this does not destroy continuity of the mean quadratic error. Consider a sequence of points 9" approaching 9^. This 
generates a sequence of eigenvectors /" and eigenvalues 7". The eigenvalues converge to the 7^ but the eigenvectors 
need not converge at all. However by compactness of the set of unit vectors in R'^, there is a subsequence along 
which the eigenvectors /" converge; and they must converge to a possible choice of eigenvectors at 6*". Thus along 
this subsequence the mean quadratic error ( p2[ ) does converge to a limit given by the same formula evaluated at 
the limiting fi etc. But this limit is equal by construction to the inverse of the target information matrix G{9). A 
standard argument now shows that the limiting mean quadratic error is continuous at 9 — 9^ . 

The m.q.e. of 9 given 9 (times N') therefore converges uniformly in a sufficiently small neighbourhood of 9^ to a 
limit continuous at that point and equal to W{9^) there. 

In our derivation of ( pq ) we required the parameter and its estimator to be bounded. By dropping the constraint 
on the length of 9 we have inadvertently lost this property. Suppose we replace our modified estimator 9 by the actual 
maximum likelihood estimator respecting the constraint. The two only differ when the unconstrained estimator lies 
outside the unit sphere; but this event only occurs with an exponentially small probability, uniformly in 9, provided 
the 7i are uniformly bounded away from in the given neighbourhood of 9'^ . From this it can be shown that the 
mean quadratic error is altered by an amount o{l/N') uniformly in 9. 

If we had worked with random numbers of measurements of each spin variable, when computing the mean quadratic 
error we would first have copied the computation above conditional on the numbers of measurements, say Xi, of each 
spin rrii. These numbers are binomially distributed with parameter N' and 7^. The conditional mean quadratic error 
would be the same as the expression above but with l/{'jiN') replaced by 1/Xi (and special provision taken for the 
possible outcome Xi — 0). So to complete the argument we must show that E(l/Xi) = l/(7i7V') + o{l/N') uniformly 
in 9. This can also be shown to be true, using the fact that Xi/N' only differs from its mean by more than a fixed 
amount with exponentially small probability as A^' — > 00 and we restrict attention to in a neighbourhood of 0" 
where the 7^ are bounded away from zero. 

Inspection of our argument shows that the convergence of the mean quadratic error is uniform in 9^ as long as we 
keep away from the boundary of the parameter space. 

By the convergence of the normalized binomial distribution to the normal distribution, the representation of the 
estimator we gave above also shows that it is asymptotically normally distributed with asymptotic covariance matrix 
equal to the target covariance matrix W . Moreover, if X has the binomial(n,p) distribution, then n'i{X/n — p) 
converges in distribution to the normal with mean zero and variance p(l — p), uniformly in p. Thus the convergence 
in distribution is also uniform in 9'^ as long as we keep away from the boundary of the parameter space. 
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D. Conditional mean quadratic error for other models 

The preceding subsection gave a complete analysis of the mean quadratic error, given the preliminary estimate 9 
for the 3 unknown parameters Oj of the parameterization (Blh . We shall first analyze the mean quadratic error when 
the unknown parameters are functions 4>i {9j ) of the parameters 9j . We shall then consider the important case when 
the state is pure and depends on two unknown parameters, and finally the case when the state is pure or mixed and 
depends on one unknown parameter, or is mixed and depends on two unknown parameters. 

Our first result is that if the change of parameters 0i(^j) is locally C^, then the m.q.e. matrix of the 0^ is obtained 
from the m.q.e. of the 9j by the Jacobian d(j)i/d9j except eventually at isolated points. This follows from the fact that 
under a smooth (locally C^) parameterization, the delta method (first order Taylor expansion) allows us to conclude 
uniform convergence of the probability distribution of yN{(f)^ — 0) to a normal limit with the target mean quadratic 
error. If the 0i and their derivatives d(j)i/d9j are bounded then this proves our claim. If there are points where the 
(j)i or their derivatives d4)i/d9j are infinite, then convergence in distribution does not necessarily imply convergence 
of moments. However a truncation device allows one to modify the estimate 0, replacing it by if any component 
is larger than cN°- for given c and a (use the method of [g3|. Lemma II. 8. 2 together with the exponential inequality 
( p9| ) for the multinomial distribution). With this minor modification one can show (uniform in (/i in a neighbourhood 

of (fP) convergence of the moments of the corresponding ■vN{(j) — (/)) to the moments of its limiting distribution, hence 
achievement of the bound in the sense of Theorem IV. In particular if the parameter (j) is bounded then the truncation 
is superfluous. 

Now turn to the pure state analog of model (|9l|). Obtain a preliminary estimate of the location of p on the surface 
of the Poincare sphere using the same method as in the mixed case, but always projecting onto the surface of the 
sphere. Next, after rotation to transform the preliminary estimate into 'spin up', reparameterize to p = ^^{1 + (f) ■ a) 



where the parameters to be estimated are {4>i, 4>2) — {d'l, 6*2) of the parameterization (84) while 03 = ^(1 — (f>1 — 4''^] 



The preliminary estimate is at (/)i = 02 = 0. The optimal measurement at this point according to Section VII A 
consists of measurements of the spins cri and (72 on specified proportions of the remaining copies. The resulting 
estimator of the parameter (0i , 02 ) is a linear function of binomial counts and hence its mean quadratic error can 



be studied exactly as in section VII C . Then we must transfer back to the originally specified parameterization, for 
instance polar coordinates. This is done as in the preceeding paragraph. If the transformation is locally C^ then 
uniform convergence in distribution to the normal law also transfers back; convergence of mean quadratic error too if 
the original parameter space is bounded. Otherwise a truncation might be necessary. In any case, we can exhibit a 
procedure optimal in the sense of Theorem IV. 

It remains to consider one- and two-dimensional sub-models of the full mixed model, and one-dimensional sub- 
models of the full pure model. We suppose that the model specifies a smooth curve or surface in the interior of the 
Poincare sphere, or a smooth curve on its surface; smoothly parameterized by a one- or two-dimensional parameter as 
appropriate. The first stage of the procedure is just as before, finishing in projection of an estimated density matrix 
into the model. Then we reparameterize locally, augmenting the dimension of the parameter to convert the model into 
a full mixed or pure model respectively. The target information for the extra parameters is zero. Compute as before 
the optimal measurement at this point. Beca use of t he zero values in the target information matrix, there will be zero 



eigenvalues 7^ in the computation of section VII A. Thus the optimal measurement will involve specified fractions 
of measurement of spin in the same number of directions as the dimension of the model. Compute the maximum 
likelihood estimator of the original parameters based on this data. If the parameterization is smooth enough the 
estimator will yet again achieve the bound of Theorem IV. 

VIII. CONCLUSIONS AND OPEN QUESTIONS 

In this paper we have solved some of the theoretical problems that arise when trying to estimate the state of a 
quantum system of which one possesses a large number of copies. This constitutes a preliminary step towards solving 
the question with which Helstrom concluded his book |^: "(...) mathematical statisticians are often concerned with 
asymptotic properties of decision strategies and estimators. (...) When the parameters of a quantum density operator 
are estimated on the basis of many observations, how does the accuracy of the estimates depend on the number 
of observations as that number grows very large? Under what conditions have the estimates asymptotic normal 
distributions? Problems such as these, and still others that doubtless will occur to physicists and mathematicians, 
remain to be solved within the framework of the quantum-mechanical theory." 

In the case of pure states of spin 1/2 particles the problem has been solved. The key result is that in the limit of 
large N, the variance of the estimate is bounded by (|28|), and the bound can be attained by separate von Neumann 
measurements on each particle. 
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In the case of mixed states of spin 1/2 particles the state estimation problem for large N has been solved if one 
restricts oneself to separable measurements. However if one considers non separable measurements, then one can 
improve the quality of the estimate, which shows that the Fisher information, which in classical statistics is additive, 
is no longer so for quantum state estimation. 

For the case of mixed states of spin 1/2 particles, or for higher spins wc do not know what the "outer" boundary 
of the set of (rescaled) achievable Fisher information matrices based on arbitrary (non sepa rable) measurements of 
N systems looks like. We have some indications about the shape of this set (see section |VF| ) and we know that it is 
convex and compact. 
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